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Next-Generation Sequencing Reveals HIV- 1 -Mediated Suppression of 
T Cell Activation and RNA Processing and Regulation of Noncoding 
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ABSTRACT Next-generation sequencing (NGS) enables the highly sensitive measurement of whole transcriptomes. We report the 
first application to our knowledge of this technology to the analysis of RNA from a CD4 + T cell line infected with intact HIV. We 
sequenced the total mRNA from infected cells and detected differences in the expression of both host and viral mRNA. Viral 
reads represented a large portion of the total mapped sequencing reads: approximately 20% at 12 h postinfection (hpi) and 40% 
at 24 hpi. We also detected a small but significant suppression of T cell activation-related genes at 12 hpi. This suppression per- 
sisted and expanded by 24 hpi, providing new possible markers of virus-induced T cell cytopathology. By 24 hpi, the expression 
of over 50% of detectable host loci was also altered, indicating widespread alteration of host processes, including RNA process- 
ing, splicing, and transport to an extent not previously reported. In addition, next-generation sequencing provided insights into 
alternative viral RNA splice events and the expression of noncoding RNAs, including micro RNA host genes. 

IMPORTANCE Recent advances in sequencing technology now allow the measurement of effectively all the RNA in a cell. This ap- 
proach is especially useful for studying models of virus infection, as it allows the simultaneous measurement of both host and 
viral RNA. Using next-generation sequencing (NGS), we measured changes in total mRNA from a HIV-infected T cell line. To 
our knowledge, this is the first application of this technology to the investigation of HIV-host interactions involving intact HIV. 
We directly measured the amount of viral mRNA in infected cells and detected novel viral RNA splice variants and changes in the 
host expression of noncoding RNA species. We also detected small changes in T cell activation and other host processes during 
the early stages of viral replication that increased near the peak of viral replication, providing new candidate biomarkers of T cell 
death. 
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The hallmark of AIDS is the loss of CD4 + T cells, the primary 
target of human immunodeficiency virus type 1 (HIV-1) in- 
fection. Infected CD4 + T cells undergo fundamental changes that 
eventually result in cell death and the release of new virus particles 
( reviewed in references 1 and 2 ) . Following the uptake of virus, the 
viral genome is reverse transcribed and integrated into the host 
genome. The host machinery is then used to produce viral tran- 
scripts that are either spliced into smaller transcripts that serve as 
the template for viral proteins or left unspliced to be incorporated 
into new virus particles. Microarray analyses have shown that in- 
fected cells respond to these assaults with gene expression changes 
in a number of pathways, including apoptosis, cell cycle, choles- 
terol biosynthesis, and inflammation (3-5; reference 1 and refer- 
ences therein). 

Many of these cellular responses are also reflected at the level of 
the host organism. For example, gene expression in the lymph 
nodes of simian immunodeficiency virus (SlV)-infected Asian 
pig- tailed macaques (a model of pathogenic HIV infection) re- 
flects strong and sustained type I interferon responses (6). A sim- 



ilar initial interferon response is seen in the natural host, African 
green monkeys, but it eventually subsides and the infection re- 
solves (6). Despite intense investigation into the molecular events 
following infection in these and other studies, fundamental gaps 
in knowledge still remain. For instance, the extent to which host 
and viral gene expression respond to each other is still poorly 
understood. New technologies enabling the measurement of both 
host and viral RNA may help to address such shortcomings. 

In this study, we used next-generation sequencing (NGS) to 
examine changes in the transcriptome of T cells infected with 
replication-competent HIV. NGS offers a number of benefits for 
the study of host-pathogen interactions. Increased sensitivity and 
accuracy over microarrays allow important events such as the ini- 
tial host response to viral replication to be studied in greater detail 
(7). Also, host and viral RNA transcripts can be assayed simulta- 
neously, rather than on separate technical platforms, allowing 
greater reproducibility and increased confidence in the results. 
Finally, because NGS allows total RNA to be assayed, insights are 
not limited to annotated transcripts or even protein-coding genes. 
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Indeed, previous work in our group has shown that many non- 
coding RNA species are differentially expressed in virus-infected 
cells (8). 

We report here the effects of HIV- 1 infection on the expression 
of polyadenylated RNAs in a CD4 + T cell line. Using NGS, we 
detected small but significant changes in host gene expression af- 
fecting T cell function that coincided with the initiation of viral 
RNA production at 12 h postinfection (hpi). These changes inten- 
sified near peak viral replication at 24 hpi when a multitude of 
other host processes were disrupted as well. In addition, by using 
NGS, we observed the dramatic expansion of viral mRNA expres- 
sion and detected new viral splice events occurring during viral 
replication and differential expression of noncoding RNA species, 
including microRNA host genes. These findings provide an un- 
precedented and comprehensive view into the transcriptome- 
level changes that occur within T cells infected with replicating 
HIV. 

RESULTS 

Viral mRNA constitutes a large portion of total mRNA of HIV- 
infected cells. In this study, we investigated changes in host and 
viral transcription occurring in a T lymphoblast-based model of 
HIV infection. SUP-T1 cells (5 X 10 6 ) were infected in triplicate 
with HIV-1 strain LAI at a multiplicity of infection (MOI) of 5 
which resulted in near-complete infection at 24 hours postinfec- 
tion (hpi). The phenotype of infected cells (including appearance, 
viability, and the amount of viral mRNA and protein) was used to 
guide the selection of samples for NGS. At 12 hpi, viral mRNA 
could be detected in infected cells at appreciable levels (see Fig. SI 
in the supplemental material). Few cells expressed the viral anti- 
gen p24 Gag, and visually, the cells appeared to be intact. By 24 
hpi, high copy numbers of viral mRNA were present (Fig. SI). 
Nearly all cells showed the presence of p24 Gag antigen, syncytia 
were observed, and nonsyncytial cells had increased in volume. By 
28 hpi, the effects of viral infection were more pronounced. The 
amount ofviral RNA peaked (Fig. SI), and cell viability decreased. 
The bulk of cell death occurred between 32 and 48 hpi. 

On the basis of these observations, we selected two time points 
to analyze by NGS: 12 and 24 hpi concurrent with the beginning of 
viral RNA accumulation and the occurrence of near-peak RNA 
levels prior to cell death, respectively. We isolated RNA from in- 
fected and mock-infected replicate cell samples at these two time 
points, created cDNA libraries from polyadenylated RNA, and 
subjected the libraries to analysis by NGS. For each sample, we 
obtained on average over 21 million 75-nucleotide (nt) reads 
mapping to either HIV or human genomes. A large number of 
reads mapped to the viral genome (2.5 to 8 million depending on 
the time point, see Table SI in the supplemental material). By 
12 hpi, viral reads constituted -18% of the total mapped reads, 
and by 24 hpi, this proportion increased to -38%. Reads mapping 
to the viral genome indicated the presence of RNA splicing events. 
In addition to splicing events involving known splice sites, we also 
observed evidence of five splicing events involving one or more 
previously unobserved splice sites (see Fig. S2 in the supplemental 
material). We tested the presence of one of these sites (located 3' 
of the annotated splice acceptor site 2) by quantitative PCR 
(qPCR) and observed an amplicon size consistent with the usage 
of the site (Fig. S2A to S2C). 

HIV-1 infection results in a small set of differentially ex- 
pressed host genes at 12 hpi. To characterize the host response to 



TABLE 1 Differentially expressed (DE) host genes in HIV-infected 
SUP-T1 cells" 





No. of DE genes at: 




DE gene type and category 1 " 


12 hpi 


24 hpi 


Upregulated genes 






1< FC < 1.5 


37 


1,386 


1.5 < FC < 2.0 


5 


1,040 


FC >2 


1 


246 


Total no. of up-regulated genes 


43 


2,672 


Downregulated genes 






1< FC < 1.5 


36 


1,079 


1.5 < FC < 2.0 


24 


851 


FC >2 


3 


404 


Total no. of down-regulated genes 


63 


2,334 



" DE genes were identified from NGS data using limma out of 9,992 total gene loci 
detected in all replicates of at least one biological condition. All genes listed have 
Benjamini-Hochberg-adjusted P values of less than 0.05. 
h FC, fold change. 



HIV infection, we mapped the remaining nonviral reads to 
RefSeq-annotated human gene loci and quantified them as FPKM 
(fragments per kilobase of exon per million mapped fragments). 
We selected a set of 34 host genes whose expression spanned a 
range of values and measured the expression levels directly by 
qPCR (see Fig. S3 in the supplemental material). Measurements 
by NGS showed a high degree of correlation with qPCR results at 
both time points (R = 0.97 and R = 0.98 at 12 and 24 hpi, respec- 
tively), indicating a close match between the two methods. We 
then identified specific host genes that were differentially ex- 
pressed (DE) during infection. Comparing FPKM for each gene in 
infected cells to time-matched mock-infected cells, we found that 
a total of 106 genes were DE at 12 hpi (representing all fold 
changes with Benjamini-Hochberg-adjusted P values of less than 
0.05; Table 1). By 24 hpi, the number of DE genes had increased to 
5,006 representing over 50% of all 9,992 gene loci detected under 
stringent criteria (Table 1). Closer examination of these values 
showed that a large proportion of DE genes occurred with small- 
magnitude changes. In particular, 69% (73 of 106) and 49% (2,465 
of5,006) of the expressed genes exhibited 1-to 1.5 -fold changes at 
12 and 24 hpi, respectively (Table 1). The observed precision of 
the sequencing-based measurements (Fig. S3) supported the use 
of low-fold change thresholds. Other values for the statistical 
threshold were also used, and for all but the most stringent thresh- 
olds (when fewer than 10 genes were observed to be up- or down- 
regulated at 12 hpi), a similar predominance of small-magnitude 
changes was observed (see Table S2 in the supplemental material). 

Comparing the directionality of change at both time points, we 
found that nearly all of the DE genes at 12 hpi continued to be 
regulated in the same manner (i.e., up- or downregulated) at 
24 hpi. Specifically, 97 of the 106 DE genes at 12 hpi were also DE 
at 24 hpi (with Benjamini-Hochberg-adjusted P < 0.05), and the 
majority of these (95 of 97) exhibited the same direction of change 
relative to mock infection, often with increased degree of change 
(as indicated by more-intense red or green coloration at 24 hpi 
than at 12 hpi in a heat map of FPKM fold changes, Fig. 1). This 
indicated that regulation coincident with the beginning of viral 
replication at 12 hpi largely persisted and intensified near peak 
viral replication at 24 hpi, even for initially low fold changes. 

HIV-1 infection impacts core T cell functionality by 12 hpi. 
We determined the possible impact of these DE genes by analyzing 



2 mBio' mbio.asm.org 



September/October 201 1 Volume 2 Issue 5 e00134-11 



NGS Analysis of a HIV-infected T Cell Line 




TRIM66 

DHRS7 

KIAA1211 

BPHL 

SMARCA2 

SLC16A14 

VPS13B 

DST 

ZNF441 

SYNE2 

C6orf70 

KCTD19 

SIDT1 

PKD2 

DUSP5 

TNIK 

CCBL1 

AC AD 11 

PTPRD 

AHSA2 

EML6 

RGMB 

FERMT2 

MANBA 

LARP6 

CACHD1 

PLEKHH1 

ATP7B 

DMXL2 

ZNF211 

DUSP16 

CHST15 

TTC13 

ZNF660 

ZKSCAN4 

PCY0X1 L 

POPDC2 

EME1 

ZNF610 

ZNF880 

BZRAP1 

SLC6A20 

EGR1 



12 hpi 



24 hpi 



-2 0 2 
Log ( inf / mock) 





MYC 

CCNI2 

TCF7 

RAG1 

ZFP36L2 

HES5 

NRARP 

GNA15 

STRA6 

SLA 

HK2 

CCNB1 

CFTR 

KPNA2 

KIF20A 

TRIB1 

C2CD4B 

GADD45B 

HEXIM1 

CREB3L3 

KCNN4 

ADORA2A 

SOCS1 

IRS1 

PER2 

SEPHS2 

EFNB2 

SELPLG 

EPHA1 

SPSB1 

CABC1 

TGIF2 

SOX4 

UNG 

PPRC1 

C10orf2 

H2AFZ 

RRP1B 

SFPQ 

BCL11B 

NRN1 

PPIF 

GPR44 

CHST2 

MEPCE 

NCR2 

FAM64A 

HS6ST1 

LRRC33 

MARS2 

CHST11 

RIPK4 

KLF13 

CDC25B 

KCNK5 

IKZF1 

YWHAG 

BIRC5 

CD1D 

MAT2A 

IRF1 

TLR9 

AKAP1 



12 hpi 



24 hpi 



FIG 1 DE genes at 12 hpi and their mRNA expression levels at 12 and 24 hpi. Values shown are log 2 ratios for each individual HIV-infected (inf) replicate cell 
sample relative to averaged time-matched mock-infected cell samples. Genes were segregated by the direction of change relative to mock infection at 12 hpi, and 
hierarchical clustering was done within each directional group. Genes that were not also DE at 24 hpi (purple type) and genes that were also DE at 24 hpi but with 
changed directionality (gold type) are indicated. Annotations indicate overrepresented categories in DAVID (Fig. 2). Matches to top-scoring categories in each 
DAVID annotation cluster (matching numbers in Fig. 2) (solid black squares) and matches to related categories in the same DAVID cluster as the top-scoring 
category (solid gray squares) are indicated, reg, regulation. 



the corresponding annotations (biological processes, molecular 
functions, and pathways) using the NIH DAVID resource. We 
found that T cell activation and differentiation were the most 
overrepresented annotations among DE host genes at 12 hpi over- 
all (Fig. 2A) . Other annotations associated with DE genes included 
protein kinase regulation, DNA recombination, and caspase ac- 
tivity regulation (Fig. 2A). In addition, several of the DE genes at 
12 hpi encoded transcriptional regulators; these genes include 
EGR1, KLF13, and MYC. Interestingly, four of the nine genes DE 
at 12 hpi but not at 24 hpi also encoded transcriptional regulators 



(CREB3L3, HEXIM1, ZNF660, and ZKSCAN4), indicating regu- 
lation specific to early viral replication. 

Seven genes contributed to the overrepresentation of T cell 
activation among DE genes at 12 hpi (BCL11B, CD1D, EGR1, 
IKZF1 , IRF1 , RAG1 , and SOX4) , six of which were downregulated, 
indicating a net suppression of T cell activation (as T cell differ- 
entiation in Fig. 2). Five T cell activation-related DE genes 
(BCL11B, EGR1, IRF1, IKZF1, and SOX4) encoded transcription 
factors, some of which had known relevance to HIV infection. For 
example, the BCL11B gene product binds directly to the HIV-1 
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FIG 2 Top annotations associated with DE genes at 1 2 and 24 hpi as identified 
by DAVID. Annotations (for biological process, molecular function, or path- 
way) were identified by Fisher's exact test and then clustered by overlapping 
gene hits. The negative log 10 -transformed modified Fisher's exact test P values 
for each cluster are shown with the number of intersecting DE genes and total 
number of genes in the top-scoring gene set within each cluster indicated to the 
right of each bar. (A) Annotations at 12 hpi. DE genes were identified by 
Benjamini-Hochberg-adjusted P values of <0.05 with no fold change cutoff. 
Clustering was performed with high stringency, and all clusters with — logi 0 P 
value greater than 1.3 are shown. (B) Annotations at 24 hpi. DE genes were 
identified as defined above for panel A with average fold change of >1.5 to 
allow submaximal number of genes for DAVID analysis. Grouping was per- 
formed with medium stringency criteria, and the top 10 clusters are shown. 



long terminal repeat (LTR) and inhibits LTR expression (9). 
BCL11B downregulation may therefore allow more efficient 
HIV-1 replication. Other DE genes related to T cell activation not 
encoding transcription factors also had known relevance to HIV 
infection. For example, the CD1D product presents lipid antigens 
on the surface and is directed by HIV- 1 Nef for internalization and 
degradation (10). Downregulation of CD1D at the mRNA level 
would further reduce surface expression and facilitate immune 
evasion. 

We complemented functional analysis in DAVID by gene set 
enrichment analysis (GSEA). This method does not rely on the 
prior determination of DE genes (e.g., by applying a P value 
threshold) but instead uses a ranked gene list as input and is there- 
fore well suited for identifying annotations among genes with 
small-magnitude changes in expression (11). GSEA identified ad- 
ditional genes that were downregulated and contributed to the 
suppression of T cell activation; these genes include the CD2, CD4, 
CD7, CD28, and SIT1 genes encoding T cell-specific surface mol- 
ecules (see Fig. S4 in the supplemental material). Many of the 
genes associated with T cell activation in DAVID and GSEA also 
had roles in other pathways, indicating possible pleiotropic effects 
(e.g., ADORA2A and RAG1 which also contribute to caspase ac- 
tivity [Fig. 1]). 

HIV- 1 induces large-scale changes in transcription by 24 hpi. 

By 24 hpi, large-scale changes in the transcriptomes of HIV- 
infected cells were evident (Table 1; see the heatmap in Fig. S5A in 



B 




FIG 3 Two networks affected by DE genes at 24 hpi. (A) T cell activation- 
related DE genes at 24 hpi as connected in a de novo network based on known 
interactions in Ingenuity Pathways Analysis (IPA). A total of 23 genes DE at 24 
hpi are shown, selected from the union of T cell activation- and T cell 
differentiation-related genes as identified in DAVID. Downregulated genes 
(green) and upregulated genes (red) are indicated. (B) RNA processing-related 
ribonucleoprotein synthesis DE genes at 24 hpi as shown in a de novo network 
in IPA. 



the supplemental material). Consistent with the large number of 
DE genes at 24 hpi, we found that a wide range of biological pro- 
cesses was affected as determined by further analysis using DAVID 
(Fig. 2B; complete listing in Table S3 in the supplemental mate- 
rial). T cell activation and differentiation continued to be overrep- 
resented and were associated with an increased number of DE 
genes (as lymphocyte differentiation [Fig. 2B] ). Other fundamen- 
tal cellular functions associated with DE genes at 24 hpi included 
DNA metabolism, transcription, mRNA processing and splicing, 
translation, protein degradation, and cell cycle control (Fig. 2B). 
This breadth of functional regulation suggests that HIV-1 infec- 
tion resulted in the effective transcriptional reprogramming of 
SUP-T1 cells by 24 hpi. 

Suppression of T cell functionality persists and expands at 24 
hpi. As was the case at 12 hpi, genes related to T cell activation and 
differentiation were predominantly downregulated at 24 hpi. All 
of the T cell activation-related DE genes at 12 hpi continued to 
exhibit the same direction of change at 24 hpi (Fig. 1 ) . Other T cell 
activation-related genes DE at 24 hpi included CD3D, CD3Q, and 
RHOH, all of which encode membrane proteins crucial for T cell 
receptor function. A network depicting known interactions 
among proteins encoded by T cell activation-related DE genes at 
24 hpi showed two highly connected nodes: LCK (lymphocyte- 
specific protein kinase) and TP53 (p53), both of which were 
strongly downregulated at 24 hpi (Fig. 3A). Our observed down- 
regulation of TP53 differed from the upregulation observed in 
previous microarray studies and may indicate a cell type-specific 
effect (1). Genes involved in other core T cell functions, such as 
proliferation, survival, and antigen presentation, were also down- 
regulated at 24 hpi. These genes included CD ID, CD28, CXCR4, 
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TNFRSF4, and TREML2. TOB1 (transducer of ERBB2), a negative 
regulator of T cell activation, proliferation, and interleukin 2 
(IL-2) production, showed increased expression and may contrib- 
ute to the downregulation observed in other genes (data not 
shown). A connection between TOB1 expression and HIV-1 in- 
fection has not previously been mentioned. Overall, the increased 
number of DE genes related to T cell activation indicated in- 
creased suppression of this pathway by 24 hpi. 

Ribonudeoprotein complex biogenesis and RNA processing. 
Gene sets related to ribonudeoprotein complex biogenesis were 
the most overrepresented annotations among DE genes at 24 hpi 
(Fig. 2B). Ribonucleoproteins contribute to diverse functions 
within the cell, including microRNA synthesis, ribosomal assem- 
bly, and translation (12). This overlap was reflected at the gene 
level as well. In particular, ribonudeoprotein complex biogenesis 
shared many DE genes at 24 hpi with another top annotation 
cluster, RNA processing. This set of genes encoding the ribonu- 
deoprotein component of the RNA processing machinery was 
almost entirely downregulated and included many members of 
the heterogeneous nuclear ribonudeoprotein (hnRNP) and small 
nuclear ribonudeoprotein (snRNP) families. In a network identi- 
fying interactions among the proteins encoded by these genes, a 
number were found to be highly connected (Fig. 3B). One of these 
was HNRNPA1 whose gene product regulates the splicing of eu- 
karyotic and viral mRNA (13). HNRNP A 1 and other hnRNP gene 
products are known to affect the localization of HIV- 1 Gag/Pol 
mRNA, and their overexpression has previously been shown to 
reduce HIV-1 replication (Fig. 3B) (14). Other DE genes at 24 hpi 
related to ribonudeoprotein biogenesis and RNA processing are 
also known to affect HIV-1 replication directly; these genes in- 
clude TARDBP (TAR DNA-binding protein 43) whose protein 
binds the transactivation response (TAR) element of integrated 
HIV-1 DNA and represses HIV-1 transcription (15-17). Down- 
regulation of HNRNPA1, TARDBP, and other related genes may 
therefore allow more efficient HIV-1 replication in SUP-T1 cells. 

RNA transport. The importance of host regulation of RNA 
fate was underscored by another cluster of gene sets related to 
RNA transport (Fig. 2B). Host factors for RNA transport are 
known to be coopted by HIV- 1 to allow the export of unspliced, 
partially spliced, or fully spliced viral mRNAs out of the nucleus 

(18) . In particular, the host factors Crml (exportin 1) and Ran 
GTPase are used for the export of unspliced viral mRNA, while the 
host factor NXF1 (nuclear RNA export factor 1) facilitates the 
export of fully spliced viral mRNA in a Ran-independent manner 

(19) . In our data set, we observed no change in the expression of 
the Crml-encoding gene XPOl, but several members of the Ran 
signaling pathway were downregulated, suggesting that the overall 
export of unspliced viral RNA was suppressed (see Fig. S5B in the 
supplemental material; data not shown for XPOl). However, in 
contrast to other RNA transport-related genes, NXF1 and NXF3 
were expressed more highly in infected cells, suggesting that HIV 
infection may have selectively enhanced the export of fully spliced 
viral RNA (Fig. S5B). 

Limited upregulation of host processes at 24 hpi. Despite the 
presence of an almost equal number of up- and downregulated 
genes at 24 hpi, relatively few gene sets (functions, processes, or 
pathways) were associated with upregulated genes. When up- and 
downregulated DE genes at 24 hpi were analyzed separately, more 
annotation clusters were observed among downregulated genes 
(70 versus 13 among upregulated genes with significance defined 



by modified Fisher's exact test as P < 0.05). A similar result was 
obtained by GSEA (see Fig. S6A and S6B in the supplemental 
material). These results suggested that upregulated genes were 
distributed across many gene sets with few occurring in particular 
functions or pathways. Among the few upregulated pathways 
were stress-activated/Jnk cascade signaling and ion transport 
(Fig. S6C). Consistent with these observations, HIV-1 Nef has 
been shown to activate Jnk signaling, ultimately activating the 
caspase cascade and triggering cell death (20). The triggering of 
apoptosis at 24 hpi is consistent with our observation that infected 
cells began to die following the 24-hpi time point. 

HIV-1 infection does not trigger innate sensing at 12 hpi. 
Notably, innate immunity was absent from the processes associ- 
ated with DE genes at 12 hpi, suggesting that HIV-1 infection 
impaired T cell activation while evading virus-sensing mecha- 
nisms (Fig. 2A). With the exception of IRF1, which was down- 
regulated at 12 hpi (Fig. 1), interferon response factors (IRFs) 
(IRF2, IRF3, IRF7, and IRF9) showed no significant change in 
expression at 12 hpi (data not shown). Furthermore, specific IRF 
targets, including IRF3 target genes IFI35, IFI44, ISG15, and 
ISG20, were also not differentially expressed at this time point (21, 
22). IRF3 targets were of particular interest, as intact cytoplasmic 
HIV- 1 DNA has been shown to activate IRF3 in CD4 + T cells (23) . 
Our observed lack of IRF3 target gene expression is consistent 
with previous observations that replicating HIV-1 suppressed 
IRF3 activity (24). We also examined the levels of inflammation- 
related genes previously observed to be differentially regulated in 
HIV infection studies (1): ANXA1, B2M, and CD69 (generally 
upregulated) and CD53, CD71,IL-13, and IL-16 (generally down- 
regulated). These genes were also either not expressed at detect- 
able levels or unchanged in expression at 12 hpi (data not shown). 

Innate immunity and the inflammatory response continued to 
be absent from the significantly upregulated gene sets at 24 hpi (by 
GSEA [see Fig. S6A in the supplemental material] ). IRF1 was more 
strongly downregulated at 24 hpi than at 12 hpi, but other IRF 
genes (IRF2, IRF3, IRF7, and IRF9) continued to show no signif- 
icant change in expression (Fig. 1; data not shown). In a separate 
experiment, we confirmed that IRF1 was downregulated at 24 hpi 
by qPCR (by an average of 44%; data not shown). Other interferon 
(IFN)-inducible genes were differentially expressed at this time 
point but in different directions, such as B2M and IFI16 (both 
downregulated) and IFI30 and ISG20 (both upregulated). This 
lack of concerted change may have offset the detection of inter- 
feron response gene sets as up- or downregulated at 24 hpi by 
GSEA. Instead, at 24 hpi, the related gene sets of cellular defense 
and immune response were found to be primarily downregulated 
(by GSEA [Fig. S6B]). Several of the downregulated genes associ- 
ated with these gene sets were also associated with T cell activation; 
these genes include CD ID, CD2, CD28, RAG1, and TNFRSF4 
(Fig. S6D). Additional immune response-specific genes down- 
regulated at 24 hpi included IL4 and IL7R, suggesting that HIV-1 
may have impaired the ability of infected cells to signal other cells 
or respond to extracellular cues. 

Regulation of noncoding RNAs by HIV-1 infection. By map- 
ping sequencing reads to RefSeq transcript annotation, we also 
observed changes in the expression of several non-protein-coding 
RNA species. Specifically, reads mapping to RefSeq transcripts 
that began with NR were considered noncoding. As before, ex- 
pression of these transcripts was compared between infected and 
mock-infected samples. At 12 hpi, only one annotated noncoding 
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FIG 4 Noncoding RNAs regulated by HIV-1 infection in SUP-T1 cells as 
detected by NGS and reads mapping to one DE microRNA host gene. (A) 
Classes of RefSeq-annotated noncoding mRNAs that were differentially ex- 
pressed at 24 hpi. Reads mapping to RefSeq transcripts beginning with "NR" 
were tabulated and classified by the indicated terms (Hypoth, hypothetical). 
Names not including one of the five terms listed were considered "other." The 
numbers refer to DE RefSeq transcripts (i.e., with Benjamini-Hochberg- 
adjusted P < 0.05), while the numbers in parentheses indicate the subset of DE 
transcripts with mean fold changes of 2 or greater. (B) Stack diagram showing 
reads mapping to microRNA host gene MIR17HG (NR_027350) as visualized 
in the UCSC Genome Browser. One representative replicate sample each for 
the mock- and HIV-infected samples is shown, and each blue square indicates 
a single mapped read at a given position. The known positions of the mature 
microRNAs derived from the MIR17HG gene is shown below the mapped 
reads with the yellow background denoting the range of microRNA sequence 
positions. chr!3, chromosome 13. 



RNA (NR_003697 encoding C7orf40) was found to be differen- 
tially expressed (Benjamini-Hochberg-adjusted P < 0.05). How- 
ever, by 24 hpi, 305 annotated noncoding RNAs were found to be 
differentially expressed by the same criterion, with 73 changed by 
an average of 2-fold or greater (Fig. 4A). Several classes of non- 
coding RNAs were represented, including microRNA host genes, 
hypothetical genes and open reading frames (ORFs), small nucle- 
olar RNAs (snoRNAs), and pseudogenes (Fig. 4A). DE pseudo- 
genes often matched their protein-coding counterparts in direc- 
tionality and degree of regulation, which suggests that regulatory 
regions were maintained and polyadenylated transcripts were 
produced (data not shown). 

We also observed differential expression of four annotated mi- 
croRNA host genes in infected cells, three of which were down- 
regulated (MIR17HG, MIR142, and MIR621), while the remain- 
ing one was upregulated (MIR518F), all with 2-fold or greater 
changes. The downregulation of MIR17HG (by 2.25-fold) was of 
particular interest, as MIR17HG encodes a cluster of microRNAs, 
including miR-17, -18A, -19A, -19B, -20A, and -92 (Fig. 4B); in 



contrast, the other microRNA host genes encode only single 
microRNAs. Reads mapped to MIR17HG 3' of the mature 
microRNA sequences, indicating that the postexcised polyadenyl- 
ated product may have been detected (Fig. 4B). We also observed 
a concurrent upregulation of known targets of MIR17HG- 
encoded microRNAs, including PCAF, a host factor required for 
Tat- induced HIV-1 gene expression (25). 

DISCUSSION 

HIV infection and host noncoding RNA. In this study, we used 
NGS to measure all of the polyadenylated RNAs in CD4 + T lym- 
phoblasts infected with intact, replication-competent HIV. NGS 
offers a number of potential benefits to the study of viral infec- 
tions; among them is the ability to detect non-protein-coding 
RNAs expressed during infection. In a previous study, we used 
NGS to detect long noncoding RNAs in the lungs of severe acute 
respiratory syndrome (SARS) coronavirus (CoV)-infected mice 
(8) . Many of these RNAs were found to be differentially expressed, 
and unique signatures of infection could be identified (8). 

In this study, we also detected several noncoding RNA species 
in HIV-infected cells that were differentially expressed (DE). In 
most cases, the functions of these noncoding RNAs and their rel- 
evance to infection remains unknown. One noncoding RNA of 
interest, the microRNA host gene MIR17HG, encoded a cluster of 
microRNAs and was strongly downregulated during infection. 
While our method of RNA library preparation precluded direct 
detection of mature microRNAs, the NGS data set allowed the 
expression of both regulators and targets of microRNAs to also be 
observed. For example, we observed a concurrent upregulation of 
the known target host gene PCAF, a cellular factor required for 
HIV replication (25). In contrast to Triboulet et al. (25) who ob- 
served that PCAF was upregulated at the protein level but not the 
mRNA levels in infected peripheral blood mononuclear cells (PB- 
MCs), we found that PCAF was upregulated at the mRNA level in 
infected SUP-T1 cells. This may indicate alternative modes of 
PCAF regulation in different cell types. In addition, MIR17HG- 
encoded microRNAs have hundreds of other candidate targets in 
the TargetScan database (26). Several of these candidates were 
found to be coexpressed with PCAF in our data set, including 
KLF3 (unpublished data). KLF3 encodes a zinc finger transcrip- 
tion factor whose precise function is unknown but whose se- 
quence is located proximally to a single-nucleotide polymorphism 
strongly associated with HIV-1 plasma levels (27). We also ob- 
served differential expression of factors that may have mediated 
MIR17HG downregulation. For example, O'Donnell et al. (28) 
found that c-Myc regulates expression of MIR17HG. Consistent 
with that study, we observed that MYC and MIR17HG were con- 
cordantly expressed, as MYC was downregulated at 12 and 24 hpi 
(Fig. 1). MYC suppression by HIV-1 may therefore underlie a 
number of subsequent of transcriptional events. 

Early regulation of host transcription factors. We also ob- 
served discordance between the large amount of viral RNA pres- 
ent (-20% of total mappable RNA) and the limited amount of 
altered host gene expression at 12 hpi (-1% of detectable gene 
loci) . That is, despite the use of host machinery and the presence of 
foreign mRNA, host transcription remained relatively unchanged. 
This result suggests that at 12 hpi, viral transcription occurred on 
top of largely undisturbed transcription of host genes. The host 
genes that we detected as differentially expressed at 12 hpi showed 
that T cell activation and differentiation were suppressed in in- 
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fected cells, likely via the downregulation of T cell activation- 
specific transcription factors (BCL11B, IRF1, IKZF1, and SOX4). 
The regulation of these and transcription factors specific for other 
functions (e.g., EGR1, KLF13, and MYC) may explain how HIV 
initiated replication with minimal disruption to host gene expres- 
sion at 12 hpi but elicited larger-scale changes later in infection. 

The suppression of T cell activation that we observed was con- 
sistent with previous studies on the response of T cells to infection 
by chemokine (C-X-C motif) receptor 4 (CXCR4) -tropic versus 
chemokine (C-C motif) receptor 5 (CCR5)-tropic HIV-1 strains 
(29-31). For example, Sirois et al. (31) found that key T cell 
activation-related genes, including LCK, were downregulated at 
24 hpi by a CXCR4-tropic strain, whereas these same genes were 
upregulated by a CCR5-tropic strain. Our study utilized the 
CXCR4-tropic strain LAI, and it would be interesting to generate 
NGS data for a CCR5-tropic virus for comparison. Interestingly, 
our observations of small-magnitude differences in expression 
were consistent with those of Sirois et al. (31), who found the 
expression of T cell activation-related genes to be changed by 0.5- 
fold or less using reverse transcription-PCR (RT-PCR). 

Large-scale disruptions to host transcription near peak viral 
replication. By 24 hpi, extensive reprogramming of the host tran- 
scriptome affecting a multitude of pathways had occurred. Re- 
markably, these large-scale changes occurred without concerted 
upregulation of innate immunity at the gene set level, suggesting 
that HIV- 1 evaded viral sensing mechanisms. By this time point, 
the most affected pathways related to RNA fate determination, 
including RNA processing. While HIV-related RNA processing 
has been studied intensively, the contribution of most host factors 
remains incompletely defined (32). Modulation of this pathway 
by the virus may allow the proportion of unspliced, partially 
spliced, and fully spliced HIV RNA to be adjusted depending on 
the stage of the HIV life cycle. Our finding that over 150 genes 
related to RNA processing were differentially expressed suggests 
that HIV-1 infection results in a more complete modulation of 
RNA processing than previously identified (5, 33). However, our 
observed downregulation of RNA processing was consistent with 
results from an earlier study using NGS to investigate changes in 
CD4 + T lymphoblasts infected with an HIV-based, non- 
replication-competent vector at 24 hpi (34). Altered regulation of 
this and other pathways has been observed using microarrays as 
well, although in general, we observed changes in greater numbers 
of genes affecting these pathways using NGS (3, 5, 33, 35, 36). 

Finally, we compared our results from NGS to other types of 
data available regarding HIV-infected cells, including protein- 
protein interaction (PPI) data and small interfering RNA 
(siRNA)-based screens. In an analysis of HIV-related PPI data, 
van Dijk et al. (37) identified sets of human proteins highly con- 
nected to host and viral proteins (37). Several proteins related to T 
cell activation were identified as highly connected; these proteins 
included CD4, CXCR4, and LCK (see Table S4 in the supplemen- 
tal material). Downregulation of these members, as we observed, 
would therefore potentially affect the functions of many other 
proteins as well. Together, these data emphasize the high degree to 
which HIV-1 suppressed this pathway. In addition, many of the 
pathways associated with DE genes were identified as essential for 
HIV-1 replication in a recent meta-analysis of siRNA-based 
screen data (38). These pathways included DNA binding, RNA 
splicing, RNA export, nuclear transport, and protein complex as- 
sembly (Table S3). While the siRNA data suggest that HIV-1 ini- 



tially requires these pathways to be active to establish an infection, 
our data suggest that HIV- 1 also later suppresses and inactivates 
these pathways during active replication. 

In conclusion, using NGS, we have reported a number of 
unique findings in HIV-infected cells: the direct measurement of 
viral and host RNA, the detection of small changes in the abun- 
dance of mRNA transcripts, the identification of novel viral RNA 
splice events, and the assay of multiple forms of noncoding RNAs, 
including insights into the regulation of microRNAs. These obser- 
vations give additional insights into the panoply of changes that 
occur in host cells infected with HIV and provide the groundwork 
for using new sequencing technologies in future studies investi- 
gating the host response to viral infection. 

MATERIALS AND METHODS 

Cell lines and virus infection. SUP-T1 cells were obtained from Ameri- 
can Type Culture Collection (CRL-1942) and propagated in RPMI 1640 
medium (Gibco) supplemented with 10% fetal bovine serum (HyClone), 
penicillin (100 U/ml), streptomycin (100 u.g/ml), and GlutaMAX-I. 
HIV-1 LAI strain (catalog no. 2522) was obtained from the NIH AIDS 
Research and Reference Reagent Program (Germantown, MD) and prop- 
agated in SUP-T1 cells. U373-MAGI-CXCR4 CEM cells were obtained 
from M. Emerman through the AIDS Research and Reference Reagent 
Program, and the virus titer in these cells was measured by the protocol of 
Deminie and Emerman (39). Typical titers reached 10 7 infectious units 
per ml. Infections were carried out at a multiplicity of infection (MOI) of 
5 and performed in triplicate. Mock-infected samples received SUP-T1 
cell conditioned medium and were also performed in triplicate. The in- 
fectious dose was optimized to achieve 100% infected cells at 24 hpi with 
-50% cell viability as measured by trypan blue exclusion assay. Infected 
cells were visualized by immunofluorescence assay with rabbit HIV-1 SF2 
p24 antiserum kindly provided by BioMolecular Technologies through 
the AIDS Research and Reference Reagent Program. 

RNA preparation and next-generation sequencing. Total RNA was 
extracted from 5 X 10 6 cells per sample using a mirVana kit (Applied 
Biosystems/Ambion, Austin, TX), and the quality and concentration of 
the RNA were determined by an Agilent 2100 Bioanalyzer. Samples were 
submitted for sequencing to IlluminaFastTrack Sequencing Services 
(Hayward, CA). cDNA libraries were generated using Illumina mRNA- 
SEQ kit. The quality and concentration of these libraries were determined 
by an Agilent 2100 Bioanalyzer. The libraries were clonally amplified on a 
cluster generation station using Illumina version 4 cluster generation re- 
agents to achieve a target density of approximately 300,000 (300K)/mm 2 
in a single channel of a flow cell. The resulting libraries were then se- 
quenced on a Genome Analyzer IIx using Illumina version 4.0 sequencing 
reagents which generated single reads of 75 nucleotides (nt). Image anal- 
ysis, base calling, and error estimation were performed using Illumina 
Analysis Pipeline (version 2.6). 

Read mapping and transcript quantification. We mapped the 75-nt 
reads to human ribosomal sequences using the short-read aligner soft- 
ware Bowtie to remove potential rRNA sequences (40). We then mapped 
the remaining unmapped reads to the HIV genome (GenBank accession 
no. K02013) using the gapped aligner software TopHat, which predicts 
HIV splicing junctions and maps intron-spanning reads to known splic- 
ing junctions (41). To quantify transcript expression, we mapped all reads 
that remained unmapped to the human reference genome (hgl9, build 
GRCh37, downloaded from UCSC genome browser, http://genome.ucsc 
.edu) using the gapped aligner software TopHat. RefSeq transcript anno- 
tations were supplied to facilitate the mapping of reads spanning known 
splicing junctions. On the basis of these human genome mapping results, 
we then estimated the levels of expression at both the transcript and locus 
levels for RefSeq-annotated genes using the transcript assembly software 
Cufflinks (42). Read sequences that mapped to more than one genomic 
location were excluded from expression quantification. For visualization, 
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BAM files were generated using TopHat and SAMtools (43) and displayed 
using the UCSC Genome Browser. 

Splice site variant detection and testing. The positions of known viral 
splice sites were found in the literature (44), and the corresponding se- 
quences were identified in strain pNL4-3 (GenBank accession no. 
AF003887). Based on sequence identity to strain LAI, a list of known 
LAI-specific splice sites was generated and compared to the NGS TopHat 
output. Splice patterns involving splice sites SD 1 and SA 2/2* were tested 
by quantitative PCR (qPCR) using primers PSK027F (5' CAGGGACTTG 
AAAGCGAAAG 3'; locations 193 to 212 in HIVBRUCG accession no. 
K02013) and PSK027R (5' TGGGGCTTGTTCCATCTATC 3'; locations 
5136 to 5155). 

Quantitative reverse transcription (RT-PCR) . RNA was reverse tran- 
scribed using the QuantiTect reverse transcription kit (Qiagen, Valencia, 
CA). The resulting cDNA samples were diluted 50 X. ABI TaqMan assays 
were run for each sample in triplicate (see Fig. S3C in the supplemental 
material). Relative expression was calculated using the AAC T method 
with averaged AC T values (where C T stands for threshold cycle) for the 
OAZ1 gene as a calibrator, as the expression of OAZ1 did not significantly 
change between 12 and 24 hpi in the NGS data. Intracellular viral RNA 
load was quantified as previously described (45). Relative change of IRF1 
was determined by qPCR using primers PSK1 (forward primer [5' TCTG 
GCTTTTTCCTCTGAGC 3'] and reverse primer [5' ATGCTTTTCTGG 
GGTCACTG 3']). 

Differential expression analysis. To compare transcript expression 
across different conditions, we first normalized transcript abundances by 
the following methodology. Transcript abundance was quantified as 
FPKM (fragments per kilobase of exon per million mapped fragments) 
values estimated by Cufflinks. We chose one sample arbitrarily as a refer- 
ence. Distributions of log 2 -transformed FPKM values between the refer- 
ence and remaining samples were compared by quantile-quantile plots. 
We determined the scaling factor for each sample as the median difference 
of the corresponding quantile values of the sample and reference. Only 
genes/transcripts with raw FPKM values of >1 in all samples were con- 
sidered in the estimation of scaling factors. We retained those genes/tran- 
scripts with nonzero FPKM values in 100% of the samples of at least one 
biological condition (our detection criterion). An offset of 1 was added to 
all normalized values to facilitate the comparisons involving one or more 
FPKM values of zero and to reduce the variability of the log ratios for low 
expression values. Transcripts were mapped to RefSeq gene loci, resulting 
in 9,992 loci with detectable reads. The data are available at http://www 
.viromics.washington.edu or upon request. The normalized expression 
data were analyzed for differential expression by using linear model meth- 
ods as implemented in the R package limma (46). P values were derived 
from linear model-based t tests between infected and time-matched 
mock-infected samples. Unless otherwise noted, we defined differential 
expression by Benjamini-Hochberg-adjusted P values of less than 0.05 
based on the assumption that a false discovery rate of 5% provided an 
acceptable balance of false-positive control and statistical power. Fold 
changes (FC) were derived from comparing the means of these groups, 
and multiple groupings of fold changes were used ( 1 .0 to 1.5 FC, 1.5 to 2.0 
FC, and 2.0+ FC) based on previous observed fold change ranges ob- 
served in high-throughput and in particular NGS data in virus-infected 
systems (8). 

DAVID and GSEA analyses. To identify annotations among DE 
genes, we used the NIH DAVID resource (47, 48). Default settings were 
used in DAVID with GO_BP_FAT, GO_CC_FAT, GO_MP_FAT, BIO- 
CARTA, and KEGG _PATHWAY gene set annotations. Complementary 
analysis was performed using gene set enrichment analysis (GSEA) (11). 
Default settings were used in GSEA with Gene Ontology (GO) categories 
(c5.all.v2.5 gene sets) and Biocarta and KEGG pathways (c2.all.v2.5 gene 
sets) and 5,000 gene set-based permutations. Leading-edge analysis was 
performed within GSEA to derive genes for hierarchically clustering up- 
and downregulated gene sets. 



Network analysis. Interactions between DE genes were identified us- 
ing Ingenuity Pathways Analysis (IPA) software (Ingenuity Systems, Red- 
wood City, CA). Networks were generated within IPA based on direct, 
literature-curated interactions. Subsets of genes were selected for input 
into IPA based on DE gene overlaps with DAVID annotation clusters. 
Heat maps were generated using the R package gplots. Comparisons to 
published networks of protein-protein interaction (PPI) data were made 
using Cytoscape (49). 
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